194 research outputs found
Dimension-free Information Concentration via Exp-Concavity
Information concentration of probability measures have important implications
in learning theory. Recently, it is discovered that the information content of
a log-concave distribution concentrates around their differential entropy,
albeit with an unpleasant dependence on the ambient dimension. In this work, we
prove that if the potentials of the log-concave distribution are exp-concave,
which is a central notion for fast rates in online and statistical learning,
then the concentration of information can be further improved to depend only on
the exp-concavity parameter, and hence, it can be dimension independent.
Central to our proof is a novel yet simple application of the variance
Brascamp-Lieb inequality. In the context of learning theory, our
concentration-of-information result immediately implies high-probability
results to many of the previous bounds that only hold in expectation
A Geometric View on Constrained M-Estimators
We study the estimation error of constrained M-estimators, and derive
explicit upper bounds on the expected estimation error determined by the
Gaussian width of the constraint set. Both of the cases where the true
parameter is on the boundary of the constraint set (matched constraint), and
where the true parameter is strictly in the constraint set (mismatched
constraint) are considered. For both cases, we derive novel universal
estimation error bounds for regression in a generalized linear model with the
canonical link function. Our error bound for the mismatched constraint case is
minimax optimal in terms of its dependence on the sample size, for Gaussian
linear regression by the Lasso
Let's be Honest: An Optimal No-Regret Framework for Zero-Sum Games
We revisit the problem of solving two-player zero-sum games in the
decentralized setting. We propose a simple algorithmic framework that
simultaneously achieves the best rates for honest regret as well as adversarial
regret, and in addition resolves the open problem of removing the logarithmic
terms in convergence to the value of the game. We achieve this goal in three
steps. First, we provide a novel analysis of the optimistic mirror descent
(OMD), showing that it can be modified to guarantee fast convergence for both
honest regret and value of the game, when the players are playing
collaboratively. Second, we propose a new algorithm, dubbed as robust
optimistic mirror descent (ROMD), which attains optimal adversarial regret
without knowing the time horizon beforehand. Finally, we propose a simple
signaling scheme, which enables us to bridge OMD and ROMD to achieve the best
of both worlds. Numerical examples are presented to support our theoretical
claims and show that our non-adaptive ROMD algorithm can be competitive to OMD
with adaptive step-size selection.Comment: Proceedings of the 35th International Conference on Machine Learnin
A Dynamical System View of Langevin-Based Non-Convex Sampling
Non-convex sampling is a key challenge in machine learning, central to
non-convex optimization in deep learning as well as to approximate
probabilistic inference. Despite its significance, theoretically there remain
many important challenges: Existing guarantees (1) typically only hold for the
averaged iterates rather than the more desirable last iterates, (2) lack
convergence metrics that capture the scales of the variables such as
Wasserstein distances, and (3) mainly apply to elementary schemes such as
stochastic gradient Langevin dynamics. In this paper, we develop a new
framework that lifts the above issues by harnessing several tools from the
theory of dynamical systems. Our key result is that, for a large class of
state-of-the-art sampling schemes, their last-iterate convergence in
Wasserstein distances can be reduced to the study of their continuous-time
counterparts, which is much better understood. Coupled with standard
assumptions of MCMC sampling, our theory immediately yields the last-iterate
Wasserstein convergence of many advanced sampling schemes such as proximal,
randomized mid-point, and Runge-Kutta integrators. Beyond existing methods, our
framework also motivates more efficient schemes that enjoy the same rigorous
guarantees.Comment: typos corrected, references adde
Continuous-time Analysis for Variational Inequalities: An Overview and Desiderata
Algorithms that solve zero-sum games, multi-objective agent objectives, or,
more generally, variational inequality (VI) problems are notoriously unstable
on general problems. Owing to the increasing need for solving such problems in
machine learning, this instability has been highlighted in recent years as a
significant research challenge. In this paper, we provide an overview of recent
progress in the use of continuous-time perspectives in the analysis and design
of methods targeting the broad VI problem class. Our presentation draws
parallels between single-objective problems and multi-objective problems,
highlighting the challenges of the latter. We also formulate various desiderata
for algorithms that apply to general VIs and we argue that achieving these
desiderata may profit from an understanding of the associated continuous-time
dynamics
Dimension-free Information Concentration via Exp-Concavity
Information concentration of probability measures have important implications in learning theory. Recently, it is discovered that the information content of a log-concave distribution concentrates around their differential entropy, albeit with an unpleasant dependence on the ambient dimension. In this work, we prove that if the potentials of the log-concave distribution are exp-concave, which is a central notion for fast rates in online and statistical learning, then the concentration of information can be further improved to depend only on the exp-concavity parameter, and hence, it can be dimension independent. Central to our proof is a novel yet simple application of the variance Brascamp-Lieb inequality. In the context of learning theory, our concentration-of-information result immediately implies high-probability results to many of the previous bounds that only hold in expectation
An Efficient Streaming Algorithm for the Submodular Cover Problem
We initiate the study of the classical Submodular Cover (SC) problem in the
data streaming model which we refer to as the Streaming Submodular Cover (SSC).
We show that any single pass streaming algorithm using sublinear memory in the
size of the stream will fail to provide any non-trivial approximation
guarantees for SSC. Hence, we consider a relaxed version of SSC, where we only
seek to find a partial cover.
We design the first Efficient bicriteria Submodular Cover Streaming
(ESC-Streaming) algorithm for this problem, and provide theoretical guarantees
for its performance supported by numerical evidence. Our algorithm finds
solutions that are competitive with the near-optimal offline greedy algorithm
despite requiring only a single pass over the data stream. In our numerical
experiments, we evaluate the performance of ESC-Streaming on active set
selection and large-scale graph cover problems.Comment: To appear in NIPS'1
Riemannian stochastic approximation algorithms
We examine a wide class of stochastic approximation algorithms for solving
(stochastic) nonlinear problems on Riemannian manifolds. Such algorithms arise
naturally in the study of Riemannian optimization, game theory and optimal
transport, but their behavior is much less understood compared to the Euclidean
case because of the lack of a global linear structure on the manifold. We
overcome this difficulty by introducing a suitable Fermi coordinate frame which
allows us to map the asymptotic behavior of the Riemannian Robbins-Monro (RRM)
algorithms under study to that of an associated deterministic dynamical system.
In so doing, we provide a general template of almost sure convergence results
that mirrors and extends the existing theory for Euclidean Robbins-Monro
schemes, despite the significant complications that arise due to the curvature
and topology of the underlying manifold. We showcase the flexibility of the
proposed framework by applying it to a range of retraction-based variants of
the popular optimistic / extra-gradient methods for solving minimization
problems and games, and we provide a unified treatment for their convergence.Comment: 33 pages, 2 figures; a one-page abstract of this paper was presented
in COLT 202
- …